This tutorial includes 7 figures:
Univariate: Mozambique Community Health Workers contribuition on Family Planning
Univariate: Province level gap to achieve under five mortality reduction goals
Bivariate: Under Five Mortality Distribution by Region in Mozamique
Multivariate: Radar plot- Main study variables by province
Multivariate: Factors that may influence the mortality rate in Mozambique over time
Multivariate: Regression Confidence Intervals - Effect of Health Human resources and Service Utilization on Child Mortality
Maps: Change in under 5 mortality rate from 2000-2010 by province
First, we bring in the dataset we will use and load libraries.
knitr::opts_chunk$set(echo = TRUE, warnings=FALSE, message=FALSE)
if (!require(ggrepel)) install.packages("ggrepel", repos = "http://cran.us.r-project.org")
library(ggplot2)
library(ggrepel)
library(foreign)
library(haven)
link="https://github.com/quinhasf/pubpol-599/raw/master/ape_analysis.dta"
chw_fp <- read_dta(url(link))
For this final project, we used data from two sources. The first is the Under Five Mortality Dataset. The second is the Community Health Workers dataset.
Under Five Mortality dataset - Provincial-level under-5, infant, and neonatal mortality from 2000 to 2010 in Mozambique was estimated using data from the 2003 and 2011 DHS and the 2008 Multiple Indicator Cluster Survey (MICS). These three datasets were merged and used to calculate the provincial-level probability of a child dying for each year of the 11-year period using direct life-table estimation methods.
Community Health Workers dataset - To understand the extent to which the program implementation was achieving the expected results we used data from two main sources (aggregated in one dataset), the Health Information System and the population census. The HIS system provided the number of CHW by province and new users of family planning. We abstracted from the census the number of women in reproductive age.
You can access the github repository HERE In the repository, you can download the R markdown file used to create this page as well as the two main datasets used, and shapefiles for mapping.
>>>>>>> 78fdaf894d5e1b1a66db6130c866345650fdd527Before we begin to create the visualizations, we will need to install and load the necessary libraries.
library(ggplot2)
library(foreign)
library(haven)
library(uwIntroStats)
library(dplyr)
library(readr)
library(ggiraph)
library(ggiraphExtra)
library(foreign)
library(dotwhisker)
library(broom)
library(dplyr)
library(openxlsx)
#This line tells R how to treat the chunks when knitting the markdown file.
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=TRUE)
Next, we will need to bring in the data and tell R to create a new dataframe. For the first plot, we will use the Community Health Workers Dataset.
We will be using the ggplot package to create the visualization. Ggplot can only read the data in a dataframe (as opposed to a table/vector).
#First we only keep the relevant variables and put it in a new data frame
tableFreq <- data.frame(chw_fp[c('province', 'ape_contrib')])
#Give the data frame columns a name
names(tableFreq)=c("province","ape_contrib")
We want our visualization to be ordered, so we create a data frame that is ordered.
tableFreqO=tableFreq[order(tableFreq$ape_contrib),]
The target contribution for community health workers to family planning is 10. So, we add new variables to the data frame indicating whether each province is above the target or below the target.
tableFreqO$gap=tableFreqO$ape_contrib-10
tableFreqO$Target=ifelse(tableFreqO$gap>0,"Above Target","Below Target")
We want to define various plot elements such as the title, subtitle, caption, and source.
loli_title = "Mozambique Community Health Workers contribuition on Family Planning"
loli_subtitle = "2017 province level gap on CHW contribuition"
loli_caption = "Fig.1: Represents the contirbution of each province to achieve the 10% target (centered at 0)
in 2017 (Gap analysis). Provinces are ploted from low to high perfomance.
Source:Health Information System"
Finally, we are ready to create the lollipop plot!
First a basic plot.
base = ggplot(tableFreqO, aes(province,gap,color=Target,
label = round(gap,1)))
lolliplot1=base + geom_segment(aes(y =0,
x = province,
yend = gap,
xend = province))
lolliplot2=lolliplot1 + geom_point()
lolliplot3= lolliplot2 + scale_x_discrete(limits=tableFreqO$province) #key element
Then we want to properly label the axis, show specific numbers, and generally and make it readable. We also want to include the various plot elements we defined before such as title, caption, source etc.
lollitplot4 = lolliplot3 + geom_text(size=3, nudge_x=0.35, nudge_y=0.1,show.legend = FALSE) +
labs(title = loli_title,
subtitle = loli_subtitle,
x ="Province",
y = "% points of the FP GAP",
caption = loli_caption) +
theme(panel.background = element_rect(fill = "gray98",
colour = "black"),
plot.caption = element_text(hjust = 0),
plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust=0.5),
legend.box.just = c("right","center"),
axis.text.x = element_text(size=7, angle = 60, vjust = 1, hjust=1)) +
geom_hline(yintercept=0,
linetype="dashed",
color = "black",
size= 0.9,
alpha= 0.8)
lollitplot4
Let’s do it again, but with a new dataset. This dataset is about child mortality rates in Mozambique. Include the link and tell R to read it.
link="https://github.com/ihsun-uw/Group3_Final_Project/raw/master/child_mortality.dta"
df <- read_dta(url(link))
This dataset includes information across different years. For this plot, we are only looking at 2010. So, as before, we want to create a dataframe with relevant information. This relevant information includes whether or not each province achieved its target for child mortality rates.
df_2010 <- subset(df, year==2010)
df_2010=df_2010[order(df_2010$U5M_COMB),]
df_2010$gap <- df_2010$U5M_COMB - 75
df_2010$Target=ifelse(df_2010$gap>0,"-Not Achieved","Achieved")
Let’s define some other plot elements such as title, source etc.
loli_title2 = "Mozambique 2010 Gap analysis to achieve MDG 4 Goal"
loli_subtitle2 = "Province level gap to achieve under five mortality reduction goal"
loli_caption2 = "Fig.2: Represents province level gap to achieve 2/3 reduction on under five mortality rate (75 per 1000 live births target centered at 0).
Source:Demographic Health Surveys"
Now let’s create our lollipop plot! This time achieving the target is below the reference line in the plot whereas in our first lolipop plot, provinces achieving the target were above the reference line. Color can be helpful to indicate what is positive or negative. In both cases, red indicates that a province is below the target rate.
base = ggplot(df_2010,
aes(x=reorder(provname, gap),
y=gap,
color=Target,
label = round(gap,1)))
lolliplot1=base + geom_segment(aes(y =0,
x = provname,
yend = gap,
xend = provname))
lolliplot2=lolliplot1 + geom_point()
lolliplot3= lolliplot2 + scale_x_discrete(limits=df_2010$provname)
lollitplot4 = lolliplot3 + geom_text(nudge_x=0.35, nudge_y=0.1,show.legend = F, size=3) +
labs(title = loli_title2,
subtitle = loli_subtitle2,
x ="Province",
y = "U5 mortality rate gap (per 1000 LB)",
caption = loli_caption2) +
theme(panel.background = element_rect(fill = "grey96",
colour = "grey50"),
plot.caption = element_text(hjust = 0, size = 8),
plot.title = element_text(hjust = 0.5, size=12, face="bold"),
plot.subtitle = element_text(hjust=0.5, face="bold"),
legend.box.just = c("right","center"),
axis.text.y = element_text(size=7),
axis.text.x = element_text(size=7, angle = 45, vjust = 1, hjust=1),
axis.title.x = element_text(size=10),
axis.title.y = element_text(size=8)) +
geom_hline(yintercept=0,
linetype="dashed",
color = "black",
size= 0.9,
alpha= 0.8)
lollitplot4
<<<<<<< HEAD
Let’s create a few box plots to show the distribution of under five mortality in the difference regions in Mozambique.
So we don’t mess with the original data, let’s create a new dataframe.
child_mort=df
As always, lets define a few misc. plot elements
box_title = "Under Five Mortality Distribuition by Region in Mozambique"
box_caption = "Fig.3 Represents the distribution of under five mortality rates (per 1000 LB) in different regions by province.
Source:Demographic Health Surveys"
Now we can plot. Note that we wanted the plot to be ordered with provinces that had the highest child mortality rates on the left and in decreasing order to the right.
base = ggplot(df, aes(x=reorder(provname,-U5M_COMB), y=U5M_COMB)) #This is where we designate that we want the plot to be ordered
biv_boxplot = base + geom_boxplot(varwidth=T, fill="grey80") +
labs(title=box_title,
caption=box_caption,
x="Province",
y="Under Five Mortality (per thousand)") +
theme(panel.background = element_rect(fill = "gray98",
colour = "black"),
plot.caption = element_text(hjust = 0),
plot.title = element_text(hjust = 0.5),
<<<<<<< HEAD
plot.caption = element_text(hjust=0))
box4+ coord_flip()
scat1_title = " title..."
base = ggplot(chw_fp, aes(x=report_ape, y=ape_contrib, label=province)) +
geom_point() + geom_text_repel()
scat1 = base + labs(title=scat1_title,
x ="Total women of reproductive age",
y = 'Community health worker contribution',
caption = source)
scat1
scat2_title = " Average Number of Active CHW to Each Province's Contribution"
base111=ggplot(data=chw_fp,aes (x=report_ape,
y=ape_contrib, label=province))+geom_point(size=3)+ geom_text_repel()
base111=base111+labs(title=scat2_title,
x ="Average Number of Active CHW",
y = 'Contribution in Each Province',
caption = source)
base111
Let’s create some mutivariate plots. The first plot is a radar plot.
First we will need to create a data frame with the relevant information. We want the average of various variables (Health Professional Present at Birth, Health Worker Density, Midwife Density, Medical Doctor Density) by province.
df1=df
df_aggre1 <- aggregate(cbind(birthatend, hwdensity, midwifedensity, mddensity)~provname, data=df1, FUN=mean )
We are choosing to exclude one outlier province. The capital of Mozambique is doing well in all of these different factors
df_aggre1 <- df_aggre1[-c(5), ] #Drop 5th observation
We want our various radar plots to appear in order, so we want to rank the provinces by these factors
# get minimun value by row
df_aggre1$min=apply(df_aggre1[,c(2:5)],1,min)
# turn this min values into a ranking
df_aggre1$min=rank(df_aggre1$min,ties.method ='first' )
# order city by ranking
prov_fact=as.factor(df_aggre1[order(df_aggre1$min),]$provname)
# turn city into ordered factor
df_aggre1$provname=factor(df_aggre1$provname,
levels= prov_fact,
labels = prov_fact,
ordered = T)
Now that it is ordered, we can clean up the data frame before we plot. We don’t need the ranks and we want the column names to be labeled properly so they make sense in the plot.
# delete column with ranks
df_aggre1$min=NULL
colnames(df_aggre1) <- c("provname", "Health Professional Present at Birth", "Health Worker Density", "Midwife Density", "Medical Doctor Density")
Now we can create our radar plot. Remember to give you plot a title and appropriate caption!
radar_title= "Radar plot: Province Main Study Variables"
radar_caption = "Fig.4: Describes how provinces are performing on study variables : Health professional present at birth, health worker density, midwife density, and medical doctor density.
Source:Demographic Health Surveys"
base = ggRadar(df_aggre1,aes(group='provname'),legend.position="none")
radar1 = base + facet_wrap(~provname,nrow =2) +
labs(title = radar_title,
caption = radar_caption) +
theme(panel.background = element_rect(fill = "gray90"),
plot.caption = element_text(hjust = 0, size = 10),
plot.title = element_text(hjust = 0.5, size=14))
radar1
We can also create a multivariate plot to look at what is happening with these various factors over time.
base = ggplot(df1, aes()) +
geom_point(aes(x=year, y=hwdensity, color="Health Worker Density"), size = 0.8, alpha=1/3) +
geom_smooth(aes(x=year, y=hwdensity), method="loess", se=T, size=1, color="purple") +
geom_point(aes(x=year, y=midwifedensity, color= "Midwife Density"), size = 0.8, alpha=1/4) +
geom_smooth(aes(x=year, y=midwifedensity), method="loess", se=T, size=1, color="green") +
geom_point(aes(x=year, y=mddensity, color= "Medical Doctor Density"), size = 0.8, alpha=1/5) +
geom_smooth(aes(x=year, y=mddensity), method="loess", se=T, size=1, color="red")
Create plot of factors that can affect mortality rates
scat_title = "Factors that may Influence the Mortality Rate in Mozambique over time"
scat_caption = "Fig. 5 This figure represents trends in three key factors (health worker density, midwife density, and medical
doctor density) from 2000 to 2010.
Source: Demographic Health Surveys"
scat = base + geom_smooth(aes(x=year, y=mddensity), method="loess", se=T, size=1, color="red") +
labs(title = scat_title,
caption = scat_caption,
y =" Variable Density (per 1000 live births)",
x = "Years")+
theme(legend.position = "bottom",
plot.caption = element_text(hjust = 0),
plot.title = element_text(hjust = 0.5)) +
scale_x_continuous(breaks = c(2000,2002,2004,2006,2008,2010)) + theme(legend.title=element_blank())
scat
Now that we have examined these different variables. Let’s run a regression to see if there is any statistically significant association between these factors and child mortality rates.
We will run 3 different models. In the first model, our left hand side variable is neonatal mortality. In the second model, we use child mortality. In the third model, we use under-five mortality.
After running the regression models, we want to save the information in data frames to plot.
##Model 1
model1=lm(NEONAT_COMB ~ birthatend + midwifedensity + hwdensity + facpop, data=df1)
model1_t = tidy(model1) %>%
mutate(model = "Neonatal Mortality")
##Model 2
model2=lm(INFANT_COMB ~ birthatend + midwifedensity + hwdensity + facpop, data=df1)
model2_t = tidy(model2) %>%
mutate(model = "Child Mortality")
##Model 3
model3=lm(U5M_COMB ~ birthatend + midwifedensity + hwdensity + facpop, data=df1)
model3_t = tidy(model3) %>%
mutate(model = "Under-Five Mortality")
Now we combine the data frames into one.
allModels=rbind(model1_t, model2_t, model3_t)
Now we can create the visualizatios of the regression coefficients with confidence intervals so that we can easily see which factors statistically significantly impact mortality rates.
reg_title = "Effect of Health Human resources and Service
Utilization on Child Mortality"
reg_caption = "Fig.6: Show regression coeficientes and 95% CI for birth atendance,
midewife desnsity and total health human resources on child mortality.
Source: DHS"
We also want to change the axis to use titles that make sense to a reader.
allModels$term[allModels$term=="birthatend"] <- "Health Professional Present at Birth"
allModels$term[allModels$term=="midwifedensity"] <- "Midwife Density"
allModels$term[allModels$term=="hwdensity"] <- "Health Worker Density"
allModels$term[allModels$term=="facpop"] <- "Health Facility Density"
Plot the three models side by side and change the colors to black.
regplot = dwplot(allModels, dot_args = list(color="black"), whisker_args = list(color="black")) + facet_wrap(~model)
We do not need the legend because each pannel is titled. We also want to add a reference line at 0 to quickly see which variables are statistically significantly different than 0.
regplot1 = regplot + theme(legend.position="none") +geom_vline(xintercept = 0,
colour = "black",
linetype = 4)
Give the plot a title and caption
regplot2 = regplot1 + labs(title = reg_title,
caption = reg_caption) +
theme(panel.background = element_rect(fill = "gray97",
colour = "black"),
plot.caption = element_text(hjust = 0, size = 8),
plot.title = element_text(hjust = 0.5, size=12)) +
theme(axis.text.y = element_text(angle = 45, vjust = 0.6))
regplot2
Bringing in mortality and map data
#link="https://github.com/quinhasf/pubpol-599/raw/master/ape_analysis.dta"
#chw_fp <- read_dta(url(link))
link="https://github.com/ihsun-uw/Group3_Final_Project/raw/master/child_mortality.dta"
df <- read_dta(url(link))
zip_mozmap_SHP = "https://github.com/ihsun-uw/Group3_Final_Project/raw/master/Mozambique%20shape%20maps.zip"
library(utils)
temp=tempfile()
download.file(zip_mozmap_SHP, temp)
unzip(temp)
# notice the parameters use in the chunk!!
library(rgdal)
mozzipMap <- readOGR("MOZ-level_1.shp",stringsAsFactors=F)
<<<<<<< HEAD
Creating the first map of the under 5 mortality rate by province
======= >>>>>>> 78fdaf894d5e1b1a66db6130c866345650fdd527layerContrib_2=merge(mozzipMap,df_new, by.x='province_num', by.y='province_num',all.x=F)
varToPlot_2=layerContrib_2$U5M_COMB
numberOfClasses=5
colorForScale='OrRd'
colors = brewer.pal(numberOfClasses, colorForScale)
intervals <- classIntervals(varToPlot_2, numberOfClasses,
style = "kmeans",
dataPrecision = 0)
colorPallette <- findColours(intervals, colors)
<<<<<<< HEAD
legendText="u5 mortality rate"
shrinkLegend=1
title="Under 5 mortality rate in Mozambique by province (2010)"
# first the ORIGINAL to signal missing values:
plot(mozzipMap,col='red',main=title, border="black", lwd=1)
title(sub=source, adj=0)
# now the info on contributions
plot(layerContrib_2, col = colorPallette,border=NA,add=T) #add
# this uses all previous information
legend('topright',
legend = names(attr(colorPallette, "table")), #values
fill = attr(colorPallette, "palette"), #colors
cex = shrinkLegend, #size
bty = "n", # no box
title=legendText)
Generating one more data set for mapping - change in u5 MR
layerContrib_3=merge(mozzipMap,df_new, by.x='province_num', by.y='province_num',all.x=F)
varToPlot_3=layerContrib_2$u5_change
numberOfClasses=5
colorForScale='YlOrRd'
colors = brewer.pal(numberOfClasses, colorForScale)
intervals <- classIntervals(varToPlot_3, numberOfClasses,
style = "kmeans",
dataPrecision = 0)
colorPallette <- findColours(intervals, colors)
<<<<<<< HEAD
=======
>>>>>>> 78fdaf894d5e1b1a66db6130c866345650fdd527
legendText="Change in u5 mortality"
shrinkLegend=1
title="Percent change in u5 mortality rate in Mozambique by province \n(2000-2010)"
# first the ORIGINAL to signal missing values:
plot(mozzipMap,col='red',main=title, border="black", lwd=1)
<<<<<<< HEAD
title(sub=source, adj=0)
=======
>>>>>>> 78fdaf894d5e1b1a66db6130c866345650fdd527
# now the info on contributions
plot(layerContrib_3, col = colorPallette,border=NA,add=T) #add
# this uses all previous information
legend('topright',
legend = names(attr(colorPallette, "table")), #values
fill = attr(colorPallette, "palette"), #colors
cex = shrinkLegend, #size
bty = "n", # no box
title=legendText)
<<<<<<< HEAD